🤗Transformers

Topic	Replies	Views	Activity
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model) 🤗Transformers	4	669	May 30, 2025
RGBA -> RGB default background color vs padding color 🤗Transformers	1	6	May 30, 2025
Why is Static Cache latency high? 🤗Transformers	2	8	May 29, 2025
Error using Trainer with Colab notebook, anyone have a solution? 🤗Transformers	1	17	May 29, 2025
LoRA training with accelerate / deepspeed DeepSpeed	3	2218	May 28, 2025
How does Q, K, V differ in LLM? 🤗Transformers	1	17	May 28, 2025
The effect of padding_side 🤗Transformers	13	14150	May 27, 2025
Prompt caching in pipelines 🤗Transformers	1	26	May 27, 2025
GETTING ERROR >> AttributeError: 'InferenceClient' object has no attribute 'post' 🤗Transformers	5	135	May 27, 2025
How does Llama For Sequence Classification determine what class corresponds to what label? 🤗Transformers	10	4747	May 25, 2025
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	583	May 25, 2025
How to merge fine-tuned LLaMA-3.1-8B (via LLaMA-Factory) into a single GGUF for LM Studio? 🤗Transformers	1	20	May 25, 2025
Generate keeps increasing memory usage on ubuntu 🤗Transformers	6	32	May 25, 2025
How does Transformers Library work under the hood? 🤗Transformers	1	15	May 22, 2025
Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B 🤗Transformers	1	11	May 22, 2025
Create a weighted loss function to handle imbalance? 🤗Transformers	3	928	May 21, 2025
Incorrect total train batch size when using tp_size > 1 and deepspeed DeepSpeed	1	26	May 20, 2025
Distributed Training w/ Trainer 🤗Transformers	10	8765	May 20, 2025
How do I load a trained checkpoint model? 🤗Transformers	1	24	May 20, 2025
Fine tuning on qwen3 🤗Transformers	2	206	May 19, 2025
TokenClassificationPipeline produce entities with "##" characters 🤗Transformers	6	23	May 19, 2025
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO) 🤗Transformers	1	32	May 19, 2025
Grouping by length makes training loss oscillate and makes evaluation loss worse 🤗Transformers	1	217	May 16, 2025
Cuda out of memory in SD3 🤗Transformers	4	21	May 16, 2025
Stopiteration error 🤗Transformers	1	76	May 16, 2025
AttributeError: 'CustomQwen3Model' object has no attribute 'config' 🤗Transformers	1	12	May 16, 2025
How to freeze layers while fine-tuning? 🤗Transformers	2	61	May 16, 2025
Trainer default distributed training behaviour 🤗Transformers	2	14	May 15, 2025
What does increasing number of heads do in the Multi-head Attention? 🤗Transformers	5	29670	May 15, 2025
Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification? 🤗Transformers	2	415	May 14, 2025